Overview

Dataset statistics

Number of variables 6
Number of observations 125497040
Missing cells 21657651
Missing cells (%) 2.9%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 3.7 GiB
Average record size in memory 32.0 B

Variable types

Numeric 4
DateTime 1
Boolean 1

Alerts

onpromotion is highly imbalanced (61.5%) Imbalance
onpromotion has 21657651 (17.3%) missing values Missing
unit_sales is highly skewed (γ1 = 582.2246437) Skewed
id is uniformly distributed Uniform
id has unique values Unique

Reproduction

Analysis started 2026-01-06 22:34:19.856455
Analysis finished 2026-01-06 22:42:03.358105
Duration 7 minutes and 43.5 seconds
Software version ydata-profiling vv4.18.0
Download configuration config.json

Variables

id
Real number (ℝ)

Uniform  Unique 

Distinct 125497040
Distinct (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 62748520
Minimum 0
Maximum 1.2549704 × 108
Zeros 1
Zeros (%) < 0.1%
Negative 0
Negative (%) 0.0%
Memory size 957.5 MiB
2026-01-06T23:42:04.456417 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 6274852
Q1 31374260
median 62748520
Q3 94122779
95-th percentile 1.1922219 × 108
Maximum 1.2549704 × 108
Range 1.2549704 × 108
Interquartile range (IQR) 62748520

Descriptive statistics

Standard deviation 36227875
Coefficient of variation (CV) 0.57735028
Kurtosis -1.2
Mean 62748520
Median Absolute Deviation (MAD) 31374260
Skewness -9.4342576 × 10-17
Sum 7.8747535 × 1015
Variance 1.3124589 × 1015
Monotonicity Strictly increasing
2026-01-06T23:42:04.504433 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 1
 
< 0.1%
1 1
 
< 0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
5 1
 
< 0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
8 1
 
< 0.1%
9 1
 
< 0.1%
Other values (125497030) 125497030
> 99.9%
Value Count Frequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
Value Count Frequency (%)
125497039 1
< 0.1%
125497038 1
< 0.1%
125497037 1
< 0.1%
125497036 1
< 0.1%
125497035 1
< 0.1%
125497034 1
< 0.1%
125497033 1
< 0.1%
125497032 1
< 0.1%
125497031 1
< 0.1%
125497030 1
< 0.1%

date
Date

Distinct 1684
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 957.5 MiB
Minimum 2013-01-01 00:00:00
Maximum 2017-08-15 00:00:00
Invalid dates 0
Invalid dates (%) 0.0%
2026-01-06T23:42:04.546217 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:42:04.594406 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

store_nbr
Real number (ℝ)

Distinct 54
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 27.464578
Minimum 1
Maximum 54
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 239.4 MiB
2026-01-06T23:42:04.641953 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 3
Q1 12
median 28
Q3 43
95-th percentile 51
Maximum 54
Range 53
Interquartile range (IQR) 31

Descriptive statistics

Standard deviation 16.33051
Coefficient of variation (CV) 0.59460263
Kurtosis -1.3568904
Mean 27.464578
Median Absolute Deviation (MAD) 16
Skewness -0.074194851
Sum 3.4467232 × 109
Variance 266.68557
Monotonicity Not monotonic
2026-01-06T23:42:04.682910 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
44 3513089
 
2.8%
45 3484244
 
2.8%
47 3457407
 
2.8%
3 3401264
 
2.7%
46 3353890
 
2.7%
49 3342531
 
2.7%
8 3261184
 
2.6%
48 3236523
 
2.6%
50 3192566
 
2.5%
6 3089799
 
2.5%
Other values (44) 92164543
73.4%
Value Count Frequency (%)
1 2562153
2.0%
2 2987840
2.4%
3 3401264
2.7%
4 2830554
2.3%
5 2666691
2.1%
6 3089799
2.5%
7 2921204
2.3%
8 3261184
2.6%
9 2773790
2.2%
10 1740482
1.4%
Value Count Frequency (%)
54 1648867
1.3%
53 1938255
1.5%
52 290581
 
0.2%
51 2960031
2.4%
50 3192566
2.5%
49 3342531
2.7%
48 3236523
2.6%
47 3457407
2.8%
46 3353890
2.7%
45 3484244
2.8%

item_nbr
Real number (ℝ)

Distinct 4036
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 972769.15
Minimum 96995
Maximum 2127114
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 478.7 MiB
2026-01-06T23:42:04.720637 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 96995
5-th percentile 177395
Q1 522383
median 959500
Q3 1354380
95-th percentile 1964356
Maximum 2127114
Range 2030119
Interquartile range (IQR) 831997

Descriptive statistics

Standard deviation 520533.6
Coefficient of variation (CV) 0.53510496
Kurtosis -0.78499653
Mean 972769.15
Median Absolute Deviation (MAD) 404376
Skewness 0.21928968
Sum 1.2207965 × 1014
Variance 2.7095523 × 1011
Monotonicity Not monotonic
2026-01-06T23:42:04.972571 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
502331 83475
 
0.1%
314384 83450
 
0.1%
364606 83308
 
0.1%
265559 83047
 
0.1%
559870 82513
 
0.1%
1036689 82134
 
0.1%
273528 82108
 
0.1%
564533 82086
 
0.1%
261052 81774
 
0.1%
414353 81755
 
0.1%
Other values (4026) 124671390
99.3%
Value Count Frequency (%)
96995 5229
 
< 0.1%
99197 4902
 
< 0.1%
103501 35841
< 0.1%
103520 53175
< 0.1%
103665 50449
< 0.1%
105574 40322
< 0.1%
105575 41311
< 0.1%
105576 39959
< 0.1%
105577 30113
< 0.1%
105693 51730
< 0.1%
Value Count Frequency (%)
2127114 247
 
< 0.1%
2126944 5
 
< 0.1%
2126842 12
 
< 0.1%
2124052 704
< 0.1%
2123863 12
 
< 0.1%
2123859 10
 
< 0.1%
2123839 13
 
< 0.1%
2123791 21
 
< 0.1%
2123790 8
 
< 0.1%
2123775 64
 
< 0.1%

unit_sales
Real number (ℝ)

Skewed 

Distinct 258474
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 8.5548653
Minimum -15372
Maximum 89440
Zeros 0
Zeros (%) 0.0%
Negative 7795
Negative (%) < 0.1%
Memory size 957.5 MiB
2026-01-06T23:42:05.020608 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum -15372
5-th percentile 1
Q1 2
median 4
Q3 9
95-th percentile 29
Maximum 89440
Range 104812
Interquartile range (IQR) 7

Descriptive statistics

Standard deviation 23.605152
Coefficient of variation (CV) 2.7592663
Kurtosis 1796939.4
Mean 8.5548653
Median Absolute Deviation (MAD) 3
Skewness 582.22464
Sum 1.0736103 × 109
Variance 557.20319
Monotonicity Not monotonic
2026-01-06T23:42:05.061619 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 23444825
18.7%
2 17749070
14.1%
3 13263841
10.6%
4 10216998
 
8.1%
5 7958957
 
6.3%
6 6423645
 
5.1%
7 5078334
 
4.0%
8 4163234
 
3.3%
9 3403350
 
2.7%
10 2879594
 
2.3%
Other values (258464) 30915192
24.6%
Value Count Frequency (%)
-15372 1
< 0.1%
-10002 1
< 0.1%
-4673 1
< 0.1%
-3606 1
< 0.1%
-3600 1
< 0.1%
-3451.363 1
< 0.1%
-2487 1
< 0.1%
-2400 2
< 0.1%
-1943 1
< 0.1%
-1806 1
< 0.1%
Value Count Frequency (%)
89440 1
< 0.1%
44142 1
< 0.1%
30000 1
< 0.1%
20748 1
< 0.1%
20000 1
< 0.1%
17146 1
< 0.1%
16000 1
< 0.1%
15375 1
< 0.1%
15000 1
< 0.1%
14483 1
< 0.1%

onpromotion
Boolean

Imbalance  Missing 

Distinct 2
Distinct (%) < 0.1%
Missing 21657651
Missing (%) 17.3%
Memory size 239.4 MiB
False
96028767 
True
 
7810622
(Missing)
21657651 
Value Count Frequency (%)
False 96028767
76.5%
True 7810622
 
6.2%
(Missing) 21657651
 
17.3%
2026-01-06T23:42:05.088917 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Interactions

2026-01-06T23:39:59.824739 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:38:11.455326 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:38:51.328884 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:39:24.836067 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:40:09.209836 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:38:20.441971 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:38:59.256685 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:39:32.669203 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:40:17.468118 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:38:31.580571 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:39:08.209437 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:39:40.373698 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:40:24.853996 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:38:42.814026 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:39:16.238729 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2026-01-06T23:39:48.358748 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Correlations

2026-01-06T23:42:05.105427 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
id item_nbr onpromotion store_nbr unit_sales
id 1.000 0.302 0.152 0.023 -0.050
item_nbr 0.302 1.000 0.073 0.014 -0.004
onpromotion 0.152 0.073 1.000 0.024 0.001
store_nbr 0.023 0.014 0.024 1.000 0.079
unit_sales -0.050 -0.004 0.001 0.079 1.000

Missing values

2026-01-06T23:40:25.965256 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2026-01-06T23:40:44.000465 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

id date store_nbr item_nbr unit_sales onpromotion
0 0 2013-01-01 25 103665 7.0 <NA>
1 1 2013-01-01 25 105574 1.0 <NA>
2 2 2013-01-01 25 105575 2.0 <NA>
3 3 2013-01-01 25 108079 1.0 <NA>
4 4 2013-01-01 25 108701 1.0 <NA>
5 5 2013-01-01 25 108786 3.0 <NA>
6 6 2013-01-01 25 108797 1.0 <NA>
7 7 2013-01-01 25 108952 1.0 <NA>
8 8 2013-01-01 25 111397 13.0 <NA>
9 9 2013-01-01 25 114790 3.0 <NA>
id date store_nbr item_nbr unit_sales onpromotion
125497030 125497030 2017-08-15 54 2086882 1.0 False
125497031 125497031 2017-08-15 54 2087409 3.0 False
125497032 125497032 2017-08-15 54 2087978 8.0 False
125497033 125497033 2017-08-15 54 2088922 7.0 False
125497034 125497034 2017-08-15 54 2089036 4.0 False
125497035 125497035 2017-08-15 54 2089339 4.0 False
125497036 125497036 2017-08-15 54 2106464 1.0 True
125497037 125497037 2017-08-15 54 2110456 192.0 False
125497038 125497038 2017-08-15 54 2113914 198.0 True
125497039 125497039 2017-08-15 54 2116416 2.0 False